Data Integration in Vector (Vertically Partitioned) Databases

نویسندگان

  • Peter Buneman
  • Jonathan Riecke
  • Eric Sandler
  • Vladimir Seroff
چکیده

Data integration requires not only the absorption of data, but the transformation of that data. For instance, manufacturing companies must combine customer order information for ordering supplies as well as for financial planning; large banks may need to produce consolidated profit-and-loss statements daily, or even more frequently, to manage liquidity and risk. The data may be spread across many heterogeneous databases located in different countries, with different standards of cleanliness, described in different currencies or different units of measure. The transformation rules may also change over time when, for instance, companies merge or split, or new rules of accounting are imposed. Clearly it is best if the data integration tool is flexible enough not only to accomodate new sources of data, but also to change the rules of combining that data. This paper describes one data integration tool, Aleri Inc.’s Modeler. We focus primarily on one aspect of Modeler: the use of vectorization both as an implementation technique and as the fundamental unit of computation. Vectorization improves both the efficiency and the ease-of-use of Modeler.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Agglomerative clustering on vertically partitioned data

Mining distributed databases is emerging as a fundamental computational problem. A common approach for mining distributed databases is to move all of the data from each database to a central site and a single model is built. Privacy concerns in many application domains prevents sharing of data, which limits data mining technology to identify patterns and trends from large amount of data. Tradit...

متن کامل

Privacy-Preserving Datamining on Vertically Partitioned Databases

In a recent paper Dinur and Nissim considered a statistical database in which a trusted database administrator monitors queries and introduces noise to the responses with the goal of maintaining data privacy [5]. Under a rigorous definition of breach of privacy, Dinur and Nissim proved that unless the total number of queries is sub-linear in the size of the database, a substantial amount of noi...

متن کامل

Transforming Complex Methods in Vertically Partitioned OO Databases

In vertically partitioned object oriented database systems, a partitioning scheme should support transparency to applications so that existing applications need not be rewritten. Also the processes of schema design should be independent of partitioning. This can be achieved by transforming methods in a user deened schema for correct execution in the partitioned domain. We present the transforma...

متن کامل

Privacy-Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products

Reluctance of statistical agencies and other data owners to share their possibly confidential or proprietary data with others who own related databases is a serious impediment to conducting mutually beneficial analyses. In this paper, we propose a protocol for securely computing matrix products in vertically partitioned data, i.e., the data sets have the same subjects but disjoint attributes. T...

متن کامل

Comparing path-based and vertically-partitioned RDF databases

Given the increasing prevalence of RDF data formats for storing and sharing data on the Semantic Web, efficient storage mechanisms for RDF data are also becoming increasingly important. We survey existing storage solutions for RDF data in an RDMS. Two recent and novel storage concepts open the door for significantly better querying efficiency. The first, proposed by Matono, et al (2005), models...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Data Eng. Bull.

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2002